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ABSTRACT 

The  computational  efficiency  of  the  One  Bit  Spectral  Correlation  Algorithm  is  compared  to  other 
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I.   INTRODUCTION 
1 .   BACKGROUND 

a.  SPECTRAL   CORRELATION  ANALYSIS 

Spectral  correlation  analysis  is  a  method  of  signal 
analysis  which  takes  advantage  of  the  cyclostationary 
properties  found  in  many  communications  signals.  Most  digital 
modulation  schemes  produce  signals  with  underlying 
periodically  time-variant  structures.  These  structures  give 
rise  to  observed  functions  of  time  which  are  successfully 
modelled  as  cyclostationary  waveforms.  This  is  especially 
true  in  connection  with  detection,  estimation, 
synchronization,  and  emitter  location.  The  spectral 
correlation  function (SCF)  is  the  cross  spectrum  of  a  signal 
with  a  time  or  freguency  shifted  version  of  itself.  The  SCF 
and  cyclostationary  waveform  theories  are  discussed  fully  in 
references  1  through  6. 

The  SCF  gives  rise  to  the  cyclic  cross  spectrum,  a 
three  dimensional  plot  of  the  signal  on  the  bifreguency  plane. 
Figure  1  shows  the  cyclic  cross  spectrum  of  a  BPSK  signal 
plotted  on  the  bifreguency  plane.  The  horizontal,  or 
freguency,  axis  is  denoted  by  f ■ .  The  vertical  axis  is  called 
the  cyclic  freguency  axis  and  is  denoted  by  a-.   Each  feature 


on  the  plane  lies  along  a  line  of  constant  a.  The  large 
feature  in  the  background  is  the  normal  signal  power  spectrum 
and  lies  along  the  a=0  axis.  The  tallest  feature  in  the 
foreground  is  twice  the  carrier  frequency  and  the  two  closest 
features  on  either  side  are  the  data. 

Because  many  modulated  signals  have  a  unique  cyclic 
spectrum,  cyclic  spectrum  analysis  is  particularly  well  suited 
for  signal  detection,  modulation,  recognition,  signal 
parameter  estimation,  and  the  design  of  communication  systems. 
However,  cyclic  spectrum  analysis  has  a  very  high 
computational  complexity  which  limits  its  use  as  a  signal  and 
systems  analysis  tool.  The  basic  operations  in  cyclic  spectrum 
analysis  techniques  are  Fourier  transformations,  convolution, 
and  product  modulation  which  are  common  to  most  signal 
processing  algorithms  [Ref.  7],  The  sheer  number  of 
calculations  required  for  cyclic  spectrum  analysis  far  exceeds 
conventional  spectral  analysis  and  often  proves  too  great  for 
general  purpose  computers. 

b.       METHODS  OF  COMPUTATION 

In  general,  there  are  two  broad  categories  of  cyclic 
spectrum  analysis  algorithms:  time-smoothing  and  frequency- 
smoothing  techniques.  Smoothing  is  the  term  used  to  describe 
the  process  by  which  unwanted  irregularities  are  removed  from 
the  cyclic  periodogram.     The  basic  time  and  frequency 


smoothing  techniques  are  developed  in  reference  3 .  Time- 
smoothing  techniques  include  the  FFT  Accumulation  Method  (FAM) 
and  the  Strip  Spectral  Correlation  Algorithm  (SSCA) .  These 
methods  were  introduced  in  reference  4  and  discussed  in 
references  8  through  9.  In  reference  10,  it  is  shown  that 
these  techniques  are  much  more  computationally  efficient  than 
the  basic  Frequency  Smoothing  Method  (FSM) .  This  is  because 
time  smoothing  techniques  require  less  computations  on  the 
average  than  their  frequency  smoothing  counterparts.  However, 
reference  7  introduced  the  One  Bit  Spectral  Correlation 
Algorithm  (OBSCA)  .  OBSCA  is  a  variation  of  FSM  which  has  some 
interesting  properties  that  offer  attractive  advantages  over 
the  less  computationally  intense  time  smoothing  algorithms. 
Each  of  these  algorithms  perform  the  computations  necessary  to 
calculate  the  entire  cyclic  cross  spectrum. 

C.  TDOA  APPLICATION 

The  One  Bit  Spectral  Correlation  Algorithm  is 
particularly  well  suited  for  time  difference  of  arrival 
applications.  The  two  primary  methods  for  computing  the  TDOA 
of  a  signal  are  the  Spectral  Correlation  Ratio  Method 
(SPECCORR)  and  the  Spectral  Coherence  Alignment  Method 
(SPECCOA) .  In  both  methods,  a  signal  which  is  received  at 
each  of  two  receivers  is  needed  to  process  the  algorithms. 
The  manner  of  computation  each  method  employs  is  slightly 
different. 


The  algorithm  for  SPECCORR  [Ref.  11]  is 

D=arg  max  (£a  (x)} 
x 


(l) 


where  the  argument  is  an  estimate  of 


Jb.(x)  A 


||f|-f,|<S,/2  ^x  (-^) 


•^(f)  e****<d£ 


(2) 


The  algorithm  for  SPECCOA  is 

D=arg  max(£a  (x)} 
x 


(3) 


and  the  argument  is  an  estimate  of 


da=Rel  f  Sayx{f)Sax{£)*ei2n(f*a/2Udf\ 


(4) 


As  can  be  noted  from  the  above  equations,  both 
SPECCORR  and  SPECCOA  require  the  computation  of  Sx  in  order 
to  determine  the  location  of  any  features  of  interest  and  then 
the  computation  of  S  along  that  a  to  perform  the  TDOA 
calculation.  It  is  the  calculation  of  Syx(f)  for  a  specific 
a  and  all  f  where  the  OBSCA  algorithm  shows  the  most  promise. 


2.   THESIS  GOALS 

This  paper  takes  a  closer  look  at  OBSCA" s  unique 
properties  and  examines  the  potential  advantages  in  computing 
the  cyclic  cross  spectrum.  Comparisons  are  made  against  the 
FAM  and  FSM  algorithms  on  the  basis  of  the  Hardware  Complexity 
Product  [Ref.  10]  for  both  the  entire  bifrequency  plane  and 
the  special  case  where  there  is  a  specific  cyclic  frequency  of 
interest.  While  SSCA  is  similar  in  performance  to  the  FAM 
over  the  entire  bifrequency  plane,  it  cannot  be  simplified  in 
a  manner  which  is  advantageous  in  calculating  the  special  case 
of  one  a  of  interest.  Therefore,  only  FAM  and  FSM  will  be 
used  for  comparison  with  OBSCA. 

A  highly  parallel  system  architecture  for  OBSCA  was 
proposed  in  reference  7.  That  architecture  was  proposed  for 
the  calculation  of  the  entire  cyclic  cross  spectrum.  This 
paper  analyzes  the  computation  requirements  for  the  algorithm 
and  proposes  a  system  architecture  which  incorporates  those 
ideas  in  a  manner  which  allows  the  OBSCA  algorithm  to  be 
implemented  so  as  to  compute  Sx(f)  and  Syx(f)  along  a  line  of 
given  a. 


II.   ONE  BIT  SPECTRAL  CORRELATION  ALGORITHM 

A.   INTRODUCTION  AND  THEORY 

The  One  Bit  Spectral  Correlation  Algorithm  is  based  on  the 
Digital  Frequency  Smoothing  Method  of  computing  the  cyclic 
cross  spectrum.  A  frequency  smoothed  point  estimate  of  the 
bifrequency  plane  at  a  point  (fj  ,a.)    is 

M  is  restricted  to  be  an  even  integer.   The  scaling  factor 
associated  with  equation  (5)  is 


J         J    2Mw=-M/2  2  2 


The  cyclic  frequency  coordinate  is  given  by  a,-  =  i/N  and 
the  spectral  frequency  coordinate  is  given  by  f-  =  j/N.  Where 
N  =  At  is  the  number  of  samples  to  be  processed  assuming  a 
unity  sampling  rate.  xAt(k)  and  YAt(k)  are  Fourier 
transformations  of  the  sampled  sequences  x(n)  and  y(n) ,  which 
are  derived,  in  turn,  by  sampling  x(t)    and  y(t)    at  a  rate  of 


fs,  assumed  to  be  unity  for  subsequent  developments.   Y(t)  is 
generally  a  time  shifted  version  of  x(t)  such  that  y(t)  =  x(t- 

t0). 

In  order  to  compute  the  entire  bifrequency  plane, 
equations  (5)  and  (6)  will  need  to  be  applied  to  the  whole 
plane  in  some  efficient  manner.  As  discussed  in  references  4 
and  9,  each  point  estimate  in  the  plane  has  a  region  of 
support  called  a  Cyclic  Spectrum  Analyzer  (CSA)  cell.  The 
cells  are  approximated  as  a  rectangular  shape  with  the  length 
determined  by  Af  and  the  width  by  4a.  It  is  shown  in 
reference  12  that  the  frequency  resolution  Af  =  M/N  and  the 
cyclic  frequency  resolution  ao  =  1/N  [Ref.  12].  To  determine 
the  statistical  reliability  of  a  point  estimate,  the  time- 
frequency  resolution  product,  AtAf  =  M,  must  be  much  greater 
than  one  [Ref.  3] . 

The  CSA  cells  are  used  to  tile  the  bifrequency  plane  in 
such  a  manner  that  there  is  no  overlap  and  that  the  gaps  are 
minimized.  This  will  ensure  that  for  the  entire  collection 
interval,  the  point  estimates  will  be  calculated  so  that  the 
CSA  cells  cover  the  whole  cyclic  cross  spectrum.  This 
requires  that  the  CSA  cells  be  contiguous  and  non-overlapping 
in  both  f  and  a.  [Ref.  12] 

The  entire  cyclic  cross  spectrum  [Ref.  7]  is 

S^f(f)Af={S^t(fum)Af:-Q±i±Q,-R±jiR)  (7) 


where 

Q=M-N,  (8) 

and  w  *-"-!%  (9) 

2M  l  ' 

Figure  2  illustrates  such  a  pattern  of  CSA  cells  for  Af  = 
1/8  and  AtAf  =  4.  In  most  cases,  meaningful  estimation 
requires  AtAf  >  512.  This  is  especially  true  for  weaker 
signals  down  in  the  noise  [Ref.  7].  However,  Af  =  1/8  is 
often  encountered  in  practice,  since  Af  need  be  no  smaller 
than  one  half  the  bandwidth  of  the  signal  of  interest,  [Ref. 
10]. 

The  $[.]  function  in  the  OBSCA  equation  is  a  complex  sign 
detector.  This  function  is  generally  applied  to  the  time  or 
frequency  shifted  spectral  sequence  prior  to  the  correlation 
computation.  The  output  of  the  complex  sign  detector  needs  to 
be  rotated  n/A  radians  in  order  to  reduce  the  complex 
multiplies  of  the  correlation  computation  to  simple  sign 
changes  and  data  multiplexing  operations.  [Ref.  7] 

Table  1  summarizes  how  all  this  works.  Column  1  shows  the 
four  possible  sign  combinations  for  a  complex  number.  Column 
2  shows  the  sign  bits  clipped  from  the  rest  of  the  data. 
Column  3  rotates  the  sign  bits  by  n/4    radians  and  column  4 


*[)']  SB(*[>'])  *[)']e^  (*[y]c^/4)'  X{*\Y}c>"l*y  (*[}']eW«]p 

"TTTT)    (p)  (ojj     (M)     («i,-*r)      (y,-,yr) 

(-1.1)     (1,0)  (-1,0)       (-1,0)       (-xr,-x.)       (-yr,y,) 

(-i.-i)   (Li)  (o.-i)    (o,i)     (-*»*r)    (-a-,-*) 

(i.-i)   (Q.i)  (1,0)     (i,Q) («„«,•)  •  (yr,-y.) 


Table  1:  OBSCA  decoding  array  operation 

takes  the  complex  conjugate  of  column  3.  Column  3  is  needed 
in  equation  (6)  while  column  4  is  used  in  equation  (5)  . 
Column  5  demonstrates  the  multiplexing  function  of  these  sign 
bits  as  applied  to  an  arbitrary  complex  signal  X,  (xr,Xj)  . 
Similarly  for  column  6  with  an  arbitrary  complex  signal  Y, 
(Yr/Yi)  •  After  the  data  is  multiplexed  by  the  appropriate 
decoding  array  it  is  passed  on  to  two  accumulators  (one  for 
the  real  and  imaginary  parts) .  These  accumulators  require  M 
additions  to  complete  an  estimate  and  then  the  output  is 
multiplied  by  the  scaling  factor  associated  with  that  CSA 
cell.  Figure  3  is  a  basic  block  diagram  of  this  process. 
[Ref.  7] 

B.   ALGORITHMIC  COMPARISONS 

1.   ENTIRE  BIFREQUENCY  PLANE 

The  Hardware  Complexity  Product,    Phu*FJf    [Ref.  10]  is  used 
as  a  measure  of  the  relative  complexity  of  a  particular 


architecture  in  the  analysis  of  both  FAM  and  FSM  in  computing 
the  spectral  correlation  function.  Phu  is  simply  the  number 
of  hardware  units  needed  to  accomplish  the  given  computations, 
while  the  Factor  of  Real  Time,  FT,  is  used  as  an  aid  in 
characterizing  the  closeness  to  real  time  computations  of  each 
of  the  different  methods. 

„  _  computation  time 
T        collect  time 


Phu*FT  =  Cy/At  where  Cu  is  the  number  of  computations  needed 
and  At  is  the  total  number  of  samples  processed.  The  Hardware 
Complexity  Product  is  useful  because  for  real  time 
calculations  FT=1,  and  phu  will  then  represent  the  total  number 
of  hardware  units  needed  to  operate  in  parallel  to  achieve 
real  time.  In  the  case  where  only  one  hardware  unit  is 
available,  phu  =1,  the  Factor  of  Real  Time  gives  a  objective 
view  of  the  length  of  time  needed  to  perform  all  the  necessary 
computations.  For  convenience,  all  complex  butterflies, 
multipliers,  and  adders  will  be  assumed  to  be  rate-1  and 
radix-2 . 

Tables  2  and  3  summarize  the  complexity  analysis  for  the 
major  sections  of  the  FSM  and  FAM  realizations,  respectively 
[Ref.  10].  Because  Sx(f)  is  concerned  only  with  real  valued 
signals,  the  inherent  symmetry  of  the  function  makes  it 
necessary  to  calculate  only  the  first  quadrant  of  the 
bifrequency  plane. 

10 
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N2/AM  required 

FFT 

Correlator 

Summer 

CDF 

(7V/2)log2JV 
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Cpx  Mpy 

(N/2)\og2N 

'     M 

— 

Cpx  Add 

N\og7N 

— 

M 

Real  Mpy 

2N\og2N 

AM 

— 

Real  Add 

3AMog2/V 

2M 

2M 

Table  2:  Complexity  summary  for  FSM 


a .  FSM 

From  Table  2,  the  number  of  real  multiplies  for  the 
FSM  algorithm  over  the  entire  bifrequency  plane  is 


Cim=2N*LOG2N+j^(m 


(11) 


Remembering  N=At  and  M=AtAf,  equation  (11)  becomes 


and  therefore 


P*FT=— £?  =2LOG7A  t+A  t 


(12) 


(13) 


Also  from  Table  2,  the  equation  for  the  number  of  real 
additions  for  FSM  is 


Cra  =  3N*LOG2N+—  (4Af) 


and  the  Hardware   Complexity  Product   becomes 


(14) 
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p     *Fr=3*LOG2At+At 


(15) 


(P  required) 

((A")2/4  required) 

Wndw 

N'  FFT 

Down 
Convt. 

Corrcl. 
Mulli. 

P  FFT 

CDF 

— 

{N'/2)\og7N' 

— 

— 

(P/2)log2P 

Cpx  Mpy 

— 

(N'/2)\og7N' 

N' 

P 

(P/2)log2P 

Cpx  Add 

— 

N'\og2N' 

— 

— 

Plog2P 

Real  Mpy 

N' 

2N'\og7N' 

AN' 

AP 

2P\og2P- 

Real  Add 

— 

3N'\og2N' 

27V' 

IP 

3Plog2P 

Table  3:  Complexity  summary  for  FAM 


b.       FMi 

Similarly,    from  Table  3,   the  number  of  real  multiplies 
for   FAM   over   the    entire   plane    is 


G?—+2—*LOG74M+4  —  +20N 

M 


(16) 


M         M 
and  so  the  Hardware   Complexity  Product   becomes 


The  number  of  real  additions  for  FAM  are  also  taken  from 
Table  3. 


N  .^  N2 


N2 


C,=12N*LOG0  —  +3  —  *LOG,4A/+2  —  +8 


M        M 


M 


(18) 


Therefore 
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pra*Fr=12*LOG2(-^)+3(-^)*LOG24AtAf+2(-^)  +  8        (19) 


c.       OBSCA 

For  OBSCA,  the  calculations  are  not  as 
straightforward.  Similarly  to  the  calculations  for  FSM,  the 
number  of  real  multiplies  for  OBSCA  is  Cr|T)=4  (CBF+CS)  ,  where  CBF 
is  the  number  of  complex  butterflies  and  Cs  is  the  number  of 
scaling  factors  required  over  the  bifrequency  plane.  The 
factor  of  4  is  due  to  the  fact  that  each  scaling  factor  is  a 
complex  quantity  which  must  be  multiplied  to  its  respective 
CSA  cell,  and  there  are  4  real  multiplies  to  each  complex  one. 
For  the  entire  bifrequency  plane,  the  number  of  scaling 
factors  is  given  by 


c,-m-&i  wo 


's  Ami  '  m 


However,  due  to  the  fact  that  the  other  two  algorithms  are 
computed  only  for  the  first  quadrant  of  the  bifrequency  plane, 
the  factor  of  4  applied  to  Cs  is  canceled.  Therefore,  the 
Hardware   Complexity  Product    for  OBSCA  becomes 

'-'^^•^t+i-z&i-zhtH-h)        (21> 
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The  number  of  real  additions  for  OBSCA  is 
Cra=6CBF+2Cs+Csum,  where  Csum  is  the  total  number  of  additions 
required  in  all  the  accumulators  over  the  entire  bifrequency 
plane.  From  equation  (5)  ,  there  are  M  additions  for  each 
chanqe  in  either  i  or  j .  So,  from  equation  (3),  Csum=M*  [range 
of  i]* [range  of  j ]  =  M*2Q*2R,  or 

Csum=M(2  (N-M)  )  (2/  ^Mdll  \\=2  (N-M)  (N-M-\i\)  (22) 


Let  i=0  in  order  to  calculate  the  worst  case  total  of 
operations.   Then 

Csum=2  (N-M)2=2  (N2-2NM+M2)  =2N2-4NM+2M2  (23) 


2C£=2(l(^2-^l)=(^2-2^2  (24) 

s  2\M        M  \M  M 


So  the  total  of  real  additions  for  OBSCA  is 


C=3N*LOG?N+2N2-4NM+M2+l  —  \  -2  — +2  (25) 

1  Ml         M 


Again,     removing    a     factor    of    4     from    Cs    and    Csun    to 
indicate  the  number  of  calculations  in  the  first  quadrant  only 

la      T  2         2  44  AtAf2     2\Af/     2 
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Figures  4  and  5  illustrate  the  relationships  between 
these  equations  quite  clearly.  The  Hardware  Complexity 
Product  is  plotted  against  the  time- frequency  resolution 
product  in  a  log  vs.  log  fashion.  This  shows  that  although 
there  is  enormous  savings  in  the  number  of  real  multiplies, 
OBSCA  is  still  not  as  advantageous  as  FAM  over  the  whole 
plane.  The  low  number  of  real  multiplies  is  offset  by  the 
exponential  increase  in  the  number  of  real  additions  as  AtAf 
increases. 

2.   SINGLE  CYCLIC  FREQUENCY  OF  INTEREST 

For  applications  such  as  Time  difference  of  arrival, 
Frequency  Difference  of  Arrival,  signal  classification,  and 
parameter  measurement,  it  is  useful  to  look  at  a  particular 
feature  on  the  bifrequency  plane.  So,  instead  of  computing 
the  first  quadrant  of  the  plane,  there  are  many  situations 
where  it  is  necessary  to  perform  only  those  calculations 
needed  for  a  single  a0  and  all  f . .  Such  situations  include 
looking  for  a  feature  occurrence  at  a  known  a,  as  well  as 
using  SPECCORR  or  SPECCOA  to  measure  the  time  difference  of 
arrival. 

a.  FSM 

The  number  of  calculations  due  to  the  complex 
butterflies  will  not  change.  While  the  following  equations 
are  again  discussed  in  terms  of  the  first  quadrant  of  the 
bifrequency  plane,  if  Sxy(f)  is  going  to  be  calculated  for 
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SPECCORR  or  SPECCOA,  it  will  be  necessary  to  include  the 
negative  frequencies  also.  Equations  (23)  through  (34)  can  be 
easily  adjusted  by  multiplying  all  but  the  first  term  by  a 
factor  of  2. 

So,  the  number  of  real  multiplies  for  FSM  along  a 
given  a0  is 

CIm=2N*LOG2N+-^L(m  (27) 


and 


Pz„*FT=-^=2*LOG2*t+2  <28> 


The  number  of  real  additions  is  given  by 


Cza=3N*LOG2N+^-(4M)  (29) 


and 

Pxa*FT=3  *LOG2L  fc+2  (30) 

b.       FAM 

A  similar  argument  follows  for  FAM.  The  number  of 
calculations  performed  by  the  complex  butterflies  remains 
unchanged.  The  number  of  calculations  required  along  one 
dimension  of  the  bifrequency  plane  reduces  from  (N')  /4  down 
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to  just  N'/2.  With  N'=N/M,  the  single  a  calculations  becomes 
N/(2M).  And  the  number  of  real  multiplies  reduces  from 
equation  (12)  to 

Czm=$N*LOG2  —  +4N*LOG24M+2  8N  (31) 


and 


p*FT=8*LOG2(^-)  +4*LOG,4AtAf+28  (32) 


The   same   reduction    is   apparent    for  the   real    additions, 

Cza  =  12N*LOG2—+6N*LOG24M+12N  (33) 


and  therefore 


pra*FT=12*LOG2(-^)  +6*LOG24AtAf+12  (34) 

AT 


C.       OBSCA 

Along  a  single  line  of  a,  the  OBSCA  calculation  for 
positive  f  is  relatively  simple.  The  number  of  operations  due 
to  the  complex  butterflies  is  the  same  as  FSM.  While  the 
relationship  for  the  real  multiplies  is  exactly  the  same  as 
for  the  whole  plane  Crm=4  (CBF+CS)  ,  Cs  is  now  much  less.  The 
largest  number  of  scaling  factors  is  along  the  f  axis  where 
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a0=0.   In  this  case,  for  single  a   and  positive  f,  Cs=0 .  5  [  (N/M)  ■ 
1]  .   So 

Czn=2N*LOG2N+2^-2  (35) 


and 


P^FT--^--2*LOG2.t*2l-±-\-2l-±-\  <36, 


The  formula  for  real  additions  is  also  the  same, 
Cra=6CBF+2Cs+CSUfn.  But  Csum,  as  well  as  Cs,  is  different.  In 
equation  (5),  there  are  M  additions  for  each  i  and  j.  Here  i 
is  a  constant.  From  equation  (7),  then,  Csum=M*  [ range  of 
j]=2MR.  With  i=0,  and  substituting  equation  (9)  for  R,  Csum 
=N-M.  And     for     positive     f,      Csum     =0.5(N-M).  Also,      2CS= 

2{0.5[ (N/M)-l] }  =(N/M)-1.     Therefore,   the  total  number  of  real 
additions   is 

CTa=3N*LOG2N+^+tj?\-l  (37) 


and 


-^)-(t^)  :6 


It  is   important  to  notice  that  because  the  double 
summation  in  equation  (7)  is  reduced  to  a  single  summation  for 
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a  given  a0,  the  number  of  real  additions  has  only  a  factor  of 
N  in  the  second  term  of  equation  (37)  instead  of  the  N  in 
equation  (25) .  This  factor  then  drops  completely  out  of 
equation  (38) .  This  rids  OBSCA  of  that  exponential  rise  in 
the  number  of  real  additions  which  is  the  reason  that  the 
algorithm  is  not  competitive  in  calculating  the  whole 
bifrequency  plane.  It  is  also  interesting  to  note  that 
equation  (38)  is  not  significantly  less  than  equation  (26)  for 
FSM.  Especially  in  light  of  the  fact  that  the  last  three 
terms  in  equation  (34)  are  considerably  less  than  one  and  can 
be  ignored  compared  to  the  first  two  terms.  Thus,  it  is 
surprising  to  see  that  FSM  is  extremely  competitive  with  OBSCA 
in  computing  the  a  single  line  of  a. 

Figures  6  and  7  show  the  Hardware  Complexity  Product  of 
the  real  multiplications  and  real  additions  for  a  single  a, 
respectively.  As  in  Figures  4  and  5,  these  are  plotted 
against  the  time-frequency  resolution  product  in  order  to 
better  display  the  tendencies  for  useful  AtAf's.  As  shown, 
OBSCA  is  much  more  efficient  than  FAM  for  a  single  a,  but  it 
is  also  shown  how  close  FSM  is  to  OBSCA. 

C.   BANDWIDTH  ADVANTAGE 

Because  of  the  manner  in  which  OBSCA  is  calculated,  there 
is  an  advantage  in  using  OBSCA  in  TDOA  calculations.  The  two 
methods,  SPECCORR  and  SPECCOA,  require  that  both  the  auto- 
correlation, Sx(f),  and  the  cross  correlation,  S  (f)  ,  be 
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computed.  OBSCA  can  reduce  the  amount  of  data  which  needs  to 
be  transferred.  Under  normal  circumstances,  two  separate 
receivers  are  used  to  receive  the  signals.  For  the  sake  of 
argument,  let  receiver  one  (Rl)  receive  the  signal  x(t)  and 
perform  the  computations,  and  receiver  two  (R2)  receive  the 
time  shifted  signal  x(t-t0)  =y (t)  .  In  order  for  Rl  to 
implement  the  SPECCOA  algorithm,  R2  must  send  all  N  sample  of 
complex  data  to  Rl.  Let  n  be  the  number  of  bits  required  for 
each  real  and  imaginary  parts.  Then,  R2  must  send  2Nn  bits  of 
data.  Rl  can  then  compute  Sxy  and  Sx  for  the  SPECCOA 
algorithm. 

Examination  of  equation  (5)  reveals  that  the  complex  sign 
detector  function, #[.] ,  only  relies  on  the  spectral  data  from 
y(t) ,  while  equation  (6)  shows  that  the  scaling  factors  can 
also  be  computed  solely  from  this  same  data.  Therefore,  in  a 
similar  situation  where  Rl  receives  x(t)  and  R2  receives 
y(t) ,the  transmission  bandwidth  required  is  less  by  at  least 
a  factor  of  10.  Once  both  signals  are  received,  Rl  can 
determine  the  cyclic  frequency  of  interest.  This  is  sent  to 
R2  using  n  bits.  R2  can  now  calculate  the  required  scaling 
factors  and  clip  the  sign  bits  from  the  spectral  data  of  y(t) . 
At  most  there  are  (N/M)-l  scaling  factors  along  the  f  axis 
[Ref.  7].  Since  these  are  complex  numbers,  this  is  sent  to  Rl 
with  ((N/M)-l)2n  bits.  The  resulting  sign  bits  for  N  samples 
only  need  2N  bits  to  be  sent  to  Rl.  The  total  number  of  bits 
required  to  transmit  all  this  data  back  and  forth  is 
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nt=(2Nn)/M+2N+n  bits  for  OBSCA  as  opposed  to  2Nn  bits 
normally.  Typical  values  are  N=4,194,304,  M=4096,  and  n=24 
bits.  These  numbers  lead  to  an  approximate  reduction  of  2  5 
times  less  data  needing  to  be  transmitted.  This  is 
illustrated  clearly  in  Figure  8. 

While  the  •[.]  data  only  has  to  be  transmitted  once,  each 
a  must  be  sent  separately.  Furthermore,  if  these  a's  are  not 
sufficiently  close  so  as  to  fall  within  the  same  set  of 
partitions,  then  a  new  set  of  scaling  factors  must  be  sent  as 
well.  This,  of  course,  would  increase  the  reguired  bandwidth 
necessary  to  transmit  the  data. 
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III.   STRUCTURES  FOR  SINGLE  CYCLIC  FREQUENCIES 

A.   INTRODUCTION 

Reference  7  introduces  a  structural  architecture  which 
allows  highly  parallel  computational  implementations  of  the 
OBSCA  in  calculating  the  entire  bifrequency  plane.  In  order 
to  accomplish  this,  reference  7  describes  a  Basic  Partitioning 
Scheme (BPS)  which  mimics  the  contiguous  and  nonoverlapping 
pattern  of  CSA  cells  in  Figure  2.  The  idea  is  that  once  the 
bifrequency  plane  has  been  appropriately  tiled  with  partitions 
based  on  the  BPS,  all  the  calculations  within  each  partition 
can  be  conducted  independently  of  any  other  partition.  Each 
partition  on  the  plane  is  mapped  into  a  square  array  called 
the  Q  array.  This  array  is  then  subdivided  into  related 
partitions  called  the  R  and  S  arrays.  These  arrays  in  turn 
led  to  a  suggestion  of  an  architecture  for  parallel 
computation.  The  example  in  reference  7  uses  Af=l/8  which  is 
achieved  with  N=16  and  M=2 .  The  number  of  partitions  from 
equation  (20)  is  Cs=25.  These  partitions  map  into  the  Q  array 
as  shown  in  Figure  9,  and  Figure  10  shows  how  the  R  and  S 
arrays  are  formed  from  the  Q  array.  Taking  the  R  array  as  a 
further  example,  reference  7  goes  on  to  show  that  if  processor 
elements  were  arranged  as  in  Figure  11,  each  partition  would 
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be  computed  in  parallel.   Further  parallelism  within  each 
processor  element  was  also  discussed. 

B.   EQUIVALENT   STRUCTURE   FOR   SINGLE   CYCLIC   FREQUENCY   OF 
INTEREST 

Continuing  with  the  concept  that  it  would  be  advantageous 
at  times  to  calculate  a  line  of  frequency  for  a  given  cyclic 
frequency  of  interest,  the  following  structure  is  presented. 
Based  on  the  ideas  found  in  reference  7,  only  minor 
modifications  need  to  be  made  to  allow  highly  parallel 
computation  of  a  single  a0. 

The  architecture  in  reference  7  is  shown  in  Figure  12. 
Since  each  partition  requires  a  contiguous  band  of  spectral 
data,  the  process  begins  by  transferring  the  spectral  data 
from  the  output  of  the  FFT's  to  the  X  and  Y  memory  buffers  of 
the  processor  element.  The  data  is  broadcasted  sequentially 
from  one  end  of  the  spectral  band  to  the  other  and  the  memory 
buffers  intercept  and  store  its  appropriate  band  of  data. 
Once  all  this  is  accomplished,  all  the  partitions  are  computed 
simultaneously. 

Upon  examining  Figures  11  and  12  closely,  it  is  of 
interest  to  note  that  for  a  given  a,  the  X  and  Y  memory  buffer 
are  only  used,  at  most,  once  each.  This  readily  suggests  that 
a  decoding  array  can  be  implemented  which  uses  a  as  an  input 
and  multiplexes  the  X  and  Y  memory  buffers  appropriately  to 
compute  the  correct  partitions  associated  with  the  given  a. 
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Figure  13  shows  the  R  and  S  arrays  with  the  X  and  Y  memory 
buffers.  Figure  14  shows  how  a  given  a  fixes  the  use  of  each 
X  and  Y  memory  buffer  to  calculate  the  partitions  for  Af=l/8. 
Once  each  partition  is  computed,  it  is  passed  on  to  be  scaled 
as  needed.  Within  each  processor  element,  the  same 
architecture  suggested  in  reference  7  can  be  used  without 
modification. 

The  advantage  in  this  structure,  of  course,  is  that  there 
are  not  so  many  processor  elements  needed  to  perform  the 
desired  calculations.  At  most,  (N/M)-l  processor  elements 
would  be  required  as  mentioned  earlier.  Although  it  would  be 
infrequently  required  to  calculate  along  the  f  axis  as  that  is 
the  normal  signal  power  spectrum,  if  less  processor  elements 
were  used  it  would  be  possible  not  to  have  enough  elements  to 
calculate  all  the  required  partitions  in  a  single  pass.  This 
would  require  that  some  processor  elements  compute  more  than 
one  partition  which  obviously  slows  performance  and  defeats 
the  advantages  inherent  in  the  parallel  design. 


24 


IV.   SUMMARY 

A.   CONCLUSIONS 

The  multiplexing  properties  of  the  OBSCA  allow  for  greatly 
reduced  numbers  of  real  multiplications.  However,  it  is  not 
sufficient  to  give  OBSCA  an  advantage  over  the  better  time 
smoothing  algorithms.  FAM  and  SSCA  are  still  far  more 
computationally  efficient  in  computing  the  entire  bifreguency 
plane  than  their  freguency  smoothing  counterparts,  even  with 
OBSCA. 

It  has  been  shown  that  OBSCA  is  much  better  suited  for 
calculating  a  single  a  than  either  FAM  or  SSCA.  However,  it 
is  interesting  to  note  that  while  OBSCA  is  intended  to 
increase  the  computational  efficiency  of  the  freguency 
smoothing  methods,  the  direct  application  of  the  FSM  proved  to 
be  nearly  as  efficient  as  OBSCA. 

There  does  exist  a  transmission  bandwidth  advantage  to 
using  OBSCA  in  TDOA  operations.  This  is  especially  true  when 
there  are  multiple  cyclic  freguencies  of  interest  within  the 
same  set  of  partitions.  The  advantage  is  diminished  slightly 
for  each  new  a  which  lies  in  a  new  partition  set. 

Application  specific  integrated  circuits  appears  to  be  a 
natural  implementation  procedure  for  the  proposed  system 
architecture  illustrated  in  Figures  11,  12,  and  14. 
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B.   RECOMMENDATIONS 

In  the  case  where  there  is  only  one  cyclic  frequency  of 
interest,  it  is  readily  apparent  from  this  research  that  a 
closer  study  of  FSM  and  OBSCA  is  required  to  understand  the 
trade  offs  between  the  two  methods.  It  is,  therefore, 
recommended  that  further  analysis  be  conducted  concentrating 
on  the  computational  and  implementational  similarities  and 
differences  that  exist  between  FSM  and  OBSCA. 

Further  architectural  study  is  needed  to  determine  a 
suitable  Application  Specific  Integrated  Circuit  (ASIC)  which 
would  be  appropriate  to  OBSCA.  The  OBSCA 's  obvious  potential 
of  implementation  using  massively  parallel  architectures  far 
exceeds  that  of  the  other  algorithms  and  also  requires  a  more 
in  depth  approach  than  given  here. 


26 


I  WSBmy 


Figure  1  BPSK  signal  on  bifrequency  plane 
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Figure  2  CSA  cells  tiling  the  bifrequency  plane 
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Figure  3  Arithmetic  unit  for  OBSCA 
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Figure  4  Complexity  of  real  multiplies  for  entire  plane 
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Figure  5  Complexity  of  real  adds  for  entire  plane 
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Figure  6  Complexity  of  real  multiplies  for  single  a 
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Figure  7  Complexity  of  real  adds  for  single  a 
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Figure  8  Comparison  of  required  transmission  bandwidths 
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The  Q  Array 
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Figure  9  Mapping  of  the  BPS  into  the  Q  array 
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Figure  10  Decomposition  of  Q  array  into  R  and  S  arrays 
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Figure  11  The  R  processor  array 
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Figure  12  System  architecture  for  Af=l/8 
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Figure  13  The  R  and  S  arrays  with  aligned  memory  buffers 
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